Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Dockerfile template #400

Merged

Conversation

trym-b
Copy link
Contributor

@trym-b trym-b commented Aug 10, 2023

Motivation

More and more applications are being run natively in containers, for a variety of reasons. This commit is the initial attempt to provide containerized support for SolrWayback.

Usage

To get going, simply run the following command:

docker build . --tag solrwayback
docker run --publish 8080:8080 --publish 8983:8983  --volume \
 <path/to/WARCs>:/host_dir --tty --interactive solrwayback bash

where <path/to/WARCs> only contains WARC files and directories.

For more details, please refer to the comments in the Dockerfile.

Implementation

The Dockerfile was created by following the instructions in the top-level readme. In addition to this, a few verification steps were added to ensure that the container works as expected.

A simple test that uses docker to build the container has been added.

Drawbacks

No proper indexing test

The added test does not verify that indexing works as expected. This should be done by adding some WARC files as test data, but this is outside the scope of this commit.

No automatic update to latest release

Whenever someone releases a new version of SolrWayback, there is no reminder or automatic failure if the relevant people forget to update the Dockerfile.

Future work

Index preservation

The Dockerfile does currently not preserve the created index, so it needs to be manually copied out of the container in order to preserve it. With the latest released SolrWayback bundle, the index can be found here:
unpacked-bundle/solrwayback_package_4.4.2/solr-7.7.3/server/solr/ configsets/netarchivebuilder/netarchivebuilder_data/index/

Custom configuration of properties

There is currently no way of using your own solrwayback.properties and solrwaybackweb.properties, which is essential for using the correct branding.

# Motivation

More and more applications are being run natively in containers, for a
variety of reasons. This commit is the initial attempt to provide
containerized support for `SolrWayback`.

# Usage

To get going, simply run the following command:

```bash
docker build . --tag solrwayback
docker run --publish 8080:8080 --publish 8983:8983  --volume \
 <path/to/WARCs>:/host_dir --tty --interactive solrwayback bash
```
where `<path/to/WARCs>` only contains `WARC` files and directories.

For more details, please refer to the comments in the `Dockerfile`.

# Implementation

The `Dockerfile` was created by following the instructions in the
top-level readme. In addition to this, a few verification steps were
added to ensure that the container works as expected.

A simple test that uses `docker` to build the container has been added.

# Drawbacks

## No proper indexing test

The added test does not verify that indexing works as expected. This
should be done by adding some `WARC` files as test data, but this is
outside the scope of this commit.

## No automatic update to latest release

Whenever someone releases a new version of `SolrWayback`, there is no
reminder or automatic failure if the relevant people forget to update
the `Dockerfile`.

# Future work

## Index preservation

The `Dockerfile` does currently not preserve the created index, so it
needs to be manually copied out of the container in order to preserve
it. With the latest released `SolrWayback` bundle, the index can be
found here:
`unpacked-bundle/solrwayback_package_4.4.2/solr-7.7.3/server/solr/
configsets/netarchivebuilder/netarchivebuilder_data/index/`

## Custom configuration of `properties`

There is currently no way of using your own `solrwayback.properties` and
`solrwaybackweb.properties`, which is essential for using the correct
branding.
@thomasegense
Copy link
Contributor

thomasegense commented Aug 14, 2023

I will try have a look at this later this week. It is important you can add your own WARC-files as you have documented. There is no need to have some default WARC-files included. If users want to try SolrWayback they can just go to live web and
see one of the many running installations with real data. So this is all good.

How do you see the solrwayback.log and solrwayback_error.log files? Can you open a terminal inside the docker? Then you should also be able to edit the two property files if needed?

I tried run the github workflow on the branch and it passed. For some reason the old docker example (I deleted) would make github fail workflow runs, so we never know if it could even compile. I never understood why github would care about a docker file in the root folder, but this PR seems to cause no problems.

Also using the SolrWayback Bundle for installing everything makes everything so much easier. The old docker had a full git checkout of solrwayback that was build and WAR-file deployed into tomcat. But I guess you can just have a git checkout outside the docker container and then easy copy the WAR-file into the docker tomcat? This could be useful for people trying to create PR etc.

@trym-b
Copy link
Contributor Author

trym-b commented Aug 14, 2023

How do you see the solrwayback.log and solrwayback_error.log files? Can you open a terminal inside the docker? Then you should also be able to edit the two property files if needed?

If you follow the instructions provided in Dockerfile you can easily inspect the log files by either using less, cat, nano or something else. I am not currently sure where those lives exactly, but if they appear in the bundle itself you can find them in ~/unpacked-bundle/solrwayback_package_${SOLRWAYBACK_VERSION}/ somewhere.

But I guess you can just have a git checkout outside the docker container and then easy copy the WAR-file into the docker tomcat?

My approach is to not deal with git at all, as I do not have a need for it. I simply use the latest bundle that has been released.

If you take a look at the Dockerfile you will find that I added some lines that explain how you are supposed to get WARC files into the container:

# To run SolrWayback, you need to launch it with the following parameters
# docker run --publish 8080:8080 --publish 8983:8983  --volume <path/to/WARCs>:/host_dir --tty --interactive solrwayback bash
# where <path/to/WARCs> is a file path that only contains WARC files and directories.

You could manually use COPY to copy the files into the container, but that would increase the size of the container/image, so instead I recommend that you mount the relevant directory so that you can just use the files directly.

This is of course not a silver bullet, as you might have a different use-case where it makes sense to copy files instead of mounting them. But this is also not the point of this pull request, it is simply for having a sample that you can start modifying. For example, there is a use-case for mounting a different directory for exporting the index that is created, which might be added as a follow-up.

trym-b added a commit to nlnwa/solrwayback-adaption that referenced this pull request Aug 15, 2023
In preparation to creating a container image for deployment in a
cluster, this commit adds the initial `Dockerfile` to the repository.

This is no way a final version, but it is a good starting point.

The dockerfile was taken from this Pull request:
netarchivesuite/solrwayback#400
trym-b added a commit to nlnwa/solrwayback-adaption that referenced this pull request Aug 16, 2023
# Motivation

To avoid accidentally breaking the docker image while developing, we
should build the image in CI. The code was copied from pull request:
netarchivesuite/solrwayback#400
trym-b added a commit to nlnwa/solrwayback-adaption that referenced this pull request Aug 16, 2023
# Motivation

To avoid accidentally breaking the docker image while developing, we
should build the image in CI. The code was copied from pull request:
netarchivesuite/solrwayback#400
Copy link
Contributor

@thomasegense thomasegense left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working as expected.
Thanks.

@thomasegense thomasegense merged commit 525aec8 into netarchivesuite:master Aug 17, 2023
3 checks passed
@trym-b trym-b deleted the feat/add-docker-template branch August 17, 2023 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants